99 research outputs found

    Mediating between Incompatible Tagsets

    Get PDF
    Proceedings of the Workshop on Annotation and Exploitation of Parallel Corpora AEPC 2010. Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk. NEALT Proceedings Series, Vol. 10 (2010), 53-62. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15893

    Vladimír „Niki“ Petkevič

    Get PDF
    28929

    Improvements to Korektor: A Case Study with Native and Non-Native Czech

    Get PDF
    Abstract: We present recent developments of Korektor, a statistical spell checking system. In addition to lexicon, Korektor uses language models to find real-word errors, detectable only in context. The models and error probabilities, learned from error corpora, are also used to suggest the most likely corrections. Korektor was originally trained on a small error corpus and used language models extracted from an in-house corpus WebColl. We show two recent improvements: • We built new language models from freely available (shuffled) versions of the Czech National Corpus and show that these perform consistently better on texts produced both by native speakers and nonnative learners of Czech. • We trained new error models on a manually annotated learner corpus and show that they perform better than the standard error model (in error detection) not only for the learners' texts, but also for our standard evaluation data of native Czech. For error correction, the standard error model outperformed non-native models in 2 out of 3 test datasets. We discuss reasons for this not-quite-intuitive improvement. Based on these findings and on an analysis of errors in both native and learners' Czech, we propose directions for further improvements of Korektor

    Compiling and annotating a learner corpus for a morphologically rich language: CzeSL, a corpus of non-native Czech

    Get PDF
    Learner corpora, linguistic collections documenting a language as used by learners, provide an important empirical foundation for language acquisition research and teaching practice. This book presents CzeSL, a corpus of non-native Czech, against the background of theoretical and practical issues in the current learner corpus research. Languages with rich morphology and relatively free word order, including Czech, are particularly challenging for the analysis of learner language. The authors address both the complexity of learner error annotation, describing three complementary annotation schemes, and the complexity of description of non-native Czech in terms of standard linguistic categories. The book discusses in detail practical aspects of the corpus creation: the process of collection and annotation itself, the supporting tools, the resulting data, their formats and search platforms. The chapter on use cases exemplifies the usefulness of learner corpora for teaching, language acquisition research, and computational linguistics. Any researcher developing learner corpora will surely appreciate the concluding chapter listing lessons learned and pitfalls to avoid

    On the tree-transformation power of XSLT

    Full text link
    XSLT is a standard rule-based programming language for expressing transformations of XML data. The language is currently in transition from version 1.0 to 2.0. In order to understand the computational consequences of this transition, we restrict XSLT to its pure tree-transformation capabilities. Under this focus, we observe that XSLT~1.0 was not yet a computationally complete tree-transformation language: every 1.0 program can be implemented in exponential time. A crucial new feature of version~2.0, however, which allows nodesets over temporary trees, yields completeness. We provide a formal operational semantics for XSLT programs, and establish confluence for this semantics

    ECMO for COVID-19 patients in Europe and Israel

    Get PDF
    Since March 15th, 2020, 177 centres from Europe and Israel have joined the study, routinely reporting on the ECMO support they provide to COVID-19 patients. The mean annual number of cases treated with ECMO in the participating centres before the pandemic (2019) was 55. The number of COVID-19 patients has increased rapidly each week reaching 1531 treated patients as of September 14th. The greatest number of cases has been reported from France (n = 385), UK (n = 193), Germany (n = 176), Spain (n = 166), and Italy (n = 136) .The mean age of treated patients was 52.6 years (range 16–80), 79% were male. The ECMO configuration used was VV in 91% of cases, VA in 5% and other in 4%. The mean PaO2 before ECMO implantation was 65 mmHg. The mean duration of ECMO support thus far has been 18 days and the mean ICU length of stay of these patients was 33 days. As of the 14th September, overall 841 patients have been weaned from ECMO support, 601 died during ECMO support, 71 died after withdrawal of ECMO, 79 are still receiving ECMO support and for 10 patients status n.a. . Our preliminary data suggest that patients placed on ECMO with severe refractory respiratory or cardiac failure secondary to COVID-19 have a reasonable (55%) chance of survival. Further extensive data analysis is expected to provide invaluable information on the demographics, severity of illness, indications and different ECMO management strategies in these patients

    Measurement of Upsilon production in collisions at root s=2.76 TeV

    Get PDF
    The production of Υ(1S)\Upsilon(1S), Υ(2S)\Upsilon(2S) and Υ(3S)\Upsilon(3S) mesons decaying into the dimuon final state is studied with the LHCb detector using a data sample corresponding to an integrated luminosity of 3.3 pb1pb^{-1} collected in proton-proton collisions at a centre-of-mass energy of s=2.76\sqrt{s}=2.76 TeV. The differential production cross-sections times dimuon branching fractions are measured as functions of the Υ\Upsilon transverse momentum and rapidity, over the ranges $p_{\rm T} Upsilon(1S) X) x B(Upsilon(1S) -> mu+mu-) = 1.111 +/- 0.043 +/- 0.044 nb, sigma(pp -> Upsilon(2S) X) x B(Upsilon(2S) -> mu+mu-) = 0.264 +/- 0.023 +/- 0.011 nb, sigma(pp -> Upsilon(3S) X) x B(Upsilon(3S) -> mu+mu-) = 0.159 +/- 0.020 +/- 0.007 nb, where the first uncertainty is statistical and the second systematic

    Studies of beauty baryon decays to D0ph− and Λ+ch− final states

    Get PDF

    A study of CP violation in B-+/- -> DK +/- and B-+/- -> D pi(+/-) decays with D -> (KSK +/-)-K-0 pi(-/+) final states

    Get PDF
    A first study of CP violation in the decay modes B±[KS0K±π]Dh±B^\pm\to [K^0_{\rm S} K^\pm \pi^\mp]_D h^\pm and B±[KS0Kπ±]Dh±B^\pm\to [K^0_{\rm S} K^\mp \pi^\pm]_D h^\pm, where hh labels a KK or π\pi meson and DD labels a D0D^0 or D0\overline{D}^0 meson, is performed. The analysis uses the LHCb data set collected in pppp collisions, corresponding to an integrated luminosity of 3 fb1^{-1}. The analysis is sensitive to the CP-violating CKM phase γ\gamma through seven observables: one charge asymmetry in each of the four modes and three ratios of the charge-integrated yields. The results are consistent with measurements of γ\gamma using other decay modes
    corecore